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Preface 


This book of proceedings gathers the contributions presented at the 6th 
URV Doctoral Workshop in Computer Science and Mathematics. This edition 
has been held in Tarragona (Catalonia, Spain) in 2020. It has been jointly 
organized by the research group Didactics of Mathematics (DIDACMAT) and 
the Doctoral Program on Computer Science and Mathematics of Security of 
Universitat Rovira i Virgili (URV). The main aim of this workshop is to pro- 
mote the dissemination of the ideas, methods and results that are developed in 
the Doctoral Thesis of the students of this doctorate program, and to promote 
the knowledge, collaboration and discussion between their respective research 
groups. 

The workshop had two invited talks, seven oral presentations and six 
posters. The first invited talk was given by Prof. Ernest Valveny, who is an 
associate professor at Universitat Autonoma de Barcelona and a researcher 
at Computer Vision Center; the talk revised the recent developments in text 
detection and recognition and explored the role of text in semantic image in- 
terpretation. The second invited talk was given by Prof. Virginia Larraz and 
Prof. Célia Bar6é, who are associate professors and researchers at Universi- 
tat d’Andorra. This talk presented the new educational model they designed 
and implemented in their University. This is an educational model completely 
based on the acquisition of a set of competences. 

In this book, the reader will find the contributions of the thirteen Ph.D. 
students that presented their works in the Workshop. Each chapter presents 
the current research work of a doctoral student, the goals and some prelimi- 
nary results. Contributions were framed in a variety of research lines, which 
include security and privacy in computer systems, artificial intelligence, med- 
ical informatics, hardware architectures, complex networks and mathematics. 
All contributions present innovative proposals, methods or applications, with 
the aim of opening new and strategic research lines. The editors and organizers 
invite you to contact the authors for more detailed explanations and encourage 
you to send them your suggestions and comments, which will certainly help 
them in their PhD theses. 


The members of the organizing committee were Dr. Carme Julia, Dr. Aida 
Valls (Coordinator of the Ph.D. program), Mr. Jaime Segarra, Mrs. Anna 
Santo and Mrs. Cristina Nieto. We would like to thank the invited speakers 
for such interesting talks. Second, we thank all the participants and, espe- 
cially, the students that presented their work in this DCSM workshop. Finally, 
we also want to thank Universitat Rovira i Virgili (URV), the Departament 
d’Enginyeria Informatica i Matematiques (DEIM) and the Escola Técnica Su- 
perior d’Enginyeria (ETSE) for their support. 


Dr. Carme Julia and Dr. Aida Valls (Editors) 
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Cleaning, improving and validating the ICUs 
patients database in Hospital Joan X XIII for 
secondary use 


David Cuadrado* 


Department of Computer Engineering and Mathematics, Universitat Rovira i 
Virgili. Tarragona, Spain 
david. cuadrado@estudiants.urv.cat 


1 Introduction 


The prediction of Days to discharge (DTD) is necessary for the proper man- 
agement of an Intensive Care Unit (ICU), DTD predictions made by medical 
staff does not always are accurate [1]. Machine Learning could be used to cre- 
ate refined predictive models for DTD. The ICU of the Hospital Joan XXIII 
in Tarragona has an information system with data about all the admitted pa- 
tients that combine data which is collected instantaneously with other data 
which is collected daily or once in the patient stay. A database with the infor- 
mation of patients in 2014-2019 was extracted and it is intended for obtaining 
DTD predictive models, but before that, it must first be debugged. This ar- 
ticle describes the most relevant parts of the debugging process carried out, 
specifically the correction of erroneous values and the treatment of missing 
values. 


2 Objective 


The ICU database is extracted directly from the information system to 
ensure the anonymity of the patients for secondary use. Within the database, 
errors or missing values can affect the quality of the predicting models. The 
objective of this study is to obtain a debugged database by detecting and 
correcting erroneous values and reducing missing values. 


3 Database description 


The content of the database and its quality must be studied and classified 
before debugging. 


* PhD advisor: David Riafio, Josep Gomez, Alejandro Rodriguez, Maria Bodi 
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3.1 Contents 


The ICU database includes patients admitted in the ICU of the Hospital 
Joan XXIII during the 2014-2019 period. There are 5528 patients represented 
in 47192 days of total treatment. These patients have 47 attributes associ- 
ated: PatientID, Age, Gender, DisYear (Age of discharge), PatType (Type of 
patient), AdmType (Type of admission), AdmWardGroup (Group of origin 
of admission), Outcome, PrevHospDays (Days in hospital before admission), 
APACHEAdmGroup (Group of admission APACHE), PrincipalDiagG (Main 
Diagnosis), TotalLOS (Total lenght of stay), APACHE-II (Patient Gravity 
Rating System), CHE (Patient Cormobility), TSA (Time since admission), 
T2D (Time To Discharge), AE (External Factors), CA (Arterial Catheter), 
SU (Urinary Probe), CVC (Central Venous Catheter), Insuline, VA (Drugs 
vasoactive), SA (Sedoanalgesics ), ATF (Antifungals), ATB (Antibiotics), 
VMI (Invasive Ventilation), VMNI (Non-invasive Ventilation), Isol (Isolation), 
LTSV (Life Support Limitation), MAP (Blood Pressure), HR (Heart Rate), 
Tmp (Temperature), Glu min, max and std (Minimum, Maximum glycemia 
and Standard deviation) NAS (Nursing Activity Score), EMINA (Pain scale), 
STRATIFY (Fall risk), SOFA (Gravity score of internal organs. There are 7 
types), Data (Discharge request) and CAMICU (status of internal organs). 

From the clinical point of view there are five types of attributes: Demo- 
graphic (Age, Sex, Gender, ID, and DisYear), type of case (Pattype, Admtype, 
AdmWardGroup, APACHEAdmGroup, and PrincipalDiagG), clinical descrip- 
tions (CA, SU, CVC, Insuline, VA, SA, ATF, ATB, VMI, VMNI, Isol, LTSV, 
MAP, HR, Tmp, Glucose min, max, and standard desviation), scores (SOFAS, 
CHE, NAS, EMINA, STRATIFY, CAMICU, and APACHE-II) and organiza- 
tional (PrevHospDays, TSA, LOS, T2D, and Outcome). Of these attributes 
there are 21 which are numerical, 12 categorical, and 14 binary. There are 14 
attributes that contain data that is taken once at some point during the stay 
and 33 that are taken daily. 


3.2 Quality of the original ICU database 


In the original ICU database there are three types of error: erroneous values, 
impossible values, and anomalous values. 

The erroneous values include 78 patients younger than 18 and 2 patient 
with unknown gender. There are also 37 patients registered as unknown type 
of patient. 

The impossible values found include 121 patients with less than zero days 
in the hospital, prior to the admission in the ICU. There are also 925 days in 
which blood pressure is lower than zero. 

The anomalous values involve three patients with an Admission Group 
tagged as zero, 791 patients with a length of stay longer than 14 days, and 172 
patients with an APACHE score bigger than 37. The category AE (External 
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factors) is different to zero in only 1332 days, the Blood Pressure has values 
outside the range of [60,90] mmHg for 67 patients, and the Tmp has values 
outside the range of [33,44] °C in 8 days. 

A missing value represents a data of the patient that has not been taken [2]. 
A first study allowed us to observe whether the missing values are concentrated 
in one year or if they are distributed over the years. All years have a similar 
proportion of missing values. 

Missing values in the original database are concentrated in the next vari- 
ables: MAP, HR, Tmp, Glu (min, max, std), NAS, EMINA, STRATIFY, 
CAMICU, and DATA. There are 105,809 missing values in the database. 


4 Improving the database for secondary use 


Erroneous and missing values received different treatments according to 
each case. 


4.1 Detecting and correcting erroneous data 


There are 78 patients younger than 18 and only one patient with unknown 
gender. They are removed from the database. For the 37 patients with PatType 
categorized as "Unknown", if the admission group was tagged as ”Postopera- 
tive” or ’Other non specific surgeries” the PatType was imputed as "Surgical", 
otherwise as "Medical". 

The impossible negative values of the attribute PrevHospDays observed for 
121 patients were set to zero. Negative blood pressure observed in 925 days for 
different patients were replaced with the values result from the interpolation 
between the days before and after. The three patients with an Admission 
Group tagged as zero were removed from the database. The attribute "LOS" 
(Length of stay) is useless for DTD analysis, therefore it is removed. 


4.2 Reducing missing values 


Missing values are distributed along the 2014-19 years with percentages 
4.89 %, 4.96 %, 5.11 %, 5.32 %, 4.56 %, and 3.82 %. Only 11 attributes 
contain missing values, and only two of them (CAMICU, Data) register 86.32 
% and 99.98 % of missing data, respectively. They were removed. CAMICU 
and Data attributes were also removed due to the huge amount of missing 
values. The other attributes have less than 4 % missing values, except NAS 
that has 12.74 %. They are kept and their missing values are imputed. Two 
imputation mechanisms were used: interpolation and replication. 

If the value of an attribute is missing but known in the days before and 
after (for a patient), these last two values were averaged and the result used 
to replace the missing value in the intermediate day. If only one of the values 
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in the day before or after is known, it is replicated in the intermediate missing 
value. 


5 Results and future work 


In the debugging process, we eliminated four attributes (LOS, AE, CAM- 
ICU, and Data), we introduced 425 new missing values, corresponding to 124 
patients, but we also corrected a considerable number of erroneous values (2110 
values corresponding to 172 patients). Table 1 shows the compared number of 
days, patients, attributes, and missing data in the UCI database before and 
after our debugging. 


Previous Final 

N (days)= 47192 46684 

N (patients)= 5529 5445 
N attributes= 47 43 

N missing= 105809 17623 

% missings— 4.77% 0.88% 


Table 1: Days, patients, attributes, and missing data in the UCI database 


Some attributes such as MAP, AE, and Tmp may require a more refined 
debugging that could potentially improve the current results. In the future, 
this work will allow us to apply machine learning methods to extract models 
to predict DTD of patients in an ICU to improve the current prediction of 
practitioners. 
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Optimizing CNWNs first layer with respect to color 
encoding 


Joao Paulo Schwarz Schiiler * 


Department of Computer Engineering and Mathematics, Universitat Rovira i Virgili 
Tarragona, Spain 
joaopaulo.schwarz@estudiants.urv.cat 


1 Introduction 


In 1989, LeCun et al. [7] devised the first Convolutional Neural Network 
(CNN), which mimicked the organization of neural cells in the visual cortex 
as convolutional filters. This new type of neural network was able to recognize 
10 digits in hand-written text very accurately. The majority of existing CNN 
models deal with the basic Red-Green-Blue (RGB) color values from input 
pixels. Despite this is the obvious choice taking into account that digital images 
are usually encoded with RGB, it’s curious that very few researchers have 
attempted to train their networks on images encoded with other color spaces 
such as Hue-Saturation-Lightness (HSL) or CIE-LAB, the definition of which 
are vastly known and long-standing in the fields of color perception [2] and 
colorimetry [6]. The rationale behind trying other color spaces than RGB is 
based on evidences that the human color vision transforms the initial neural 
signals from cones and rods into an opponent color model [5], where several 
layers of neurons convert the Short, Medium and Large wavelength neural 
signals, loosely related to blue, green and red hues, into other neural signals. In 
regards to the human color perception [2], these opponent signals are further 
processed and converted into perceptual color components, named as Hue, 
Saturation and Lightness. There are several computational models that convert 
RGB into HSL-related components, for example, Smith’s HSI [3] and Yagi’s 
HSV [4]. 


2 Materials and methods 


In order to check our hypothesis, we will perform image classification exper- 
iments on the CIFAR-10 dataset [1], which consists of 60k 32x32 RGB labelled 
images, belonging to 10 different classes: airplane, automobile, bird, cat, etc. 
These images are taken from natural and uncontrolled lightning environment, 
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contain only one prominent instance of the object to which the class refers, 
and the object may be partially occluded or seen from an unusual viewpoint. 
We aim to explore the simplest CNN able to obtain a reasonable test accuracy 
(above 80%) in the CIFAR-10 image classification task, in order to compare 
its behavior (accuracy variation, patterns in first layer filter, etc.) with various 
color encodings, from the basic RGB to HSL or LAB, as well as the bare gray 
level value of pixels (colorless images). 


3 Experiments 


As a baseline, we defined a single-branch CNN architecture small enough to 
classify CIFAR-10 dataset with at least 80% test accuracy. This single-branch 
architecture is shown in figure 1. 


Fig. 1: Graphical representation of the single-branch baseline CNN architec- 
ture. 


One of the purposes of our research is to create an architecture that takes 
advantage of separated chromatic and achromatic channels, which are readily 
available in color spaces such as CIE-LAB or HSV, as explained in the intro- 
ductory section. To this aim, we propose to create two separate paths for the 
first convolutional layer, each one dedicated to each type of pixel information 
(achromatic/chromatic), in order to specialize the first layer filters of the CNN 
to the mentioned aspects of the scene (light variations, object boundaries). We 
hypothesize that this specialization may lead to better object identification, 
as a consequence of a more object-related representation of the image content. 
Figure 2 shows the proposed two-branch architecture, where the top branch 
processes the single achromatic channel while the bottom branch processes 
the two chromatic channels. For example, we can convert RGB into CIE-LAB 
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color encoding, hence the L channel is fed into the top branch, while the AB 
channels are fed into the bottom branch. 


Fig. 2: Graphical representation of the single-branch baseline CNN architec- 
ture. 


4 Results 


As shown in the table 1, our baseline RGB model obtained 84.4% accu- 
racy with 15.5 million floating point operations on the forward pass while our 
two-paths model obtained 84.7% accuracy with 11.7 million flops meaning a 
reduction about 29% in the required forward pass computation. 


model color space accuracy million flops 
baseline RGB 84.4% 16.5 
two-paths LAB 84.7% 11.7 


Table 1: RGB baseline and LAB two-paths results. 


5 Conclusions 


By splitting LAB filter values into two branches, one for L and another for 
AB, we can force a CNN to find prototypical sets of achromatic/chromatic 
filters allowing the CNN to achieve similar accuracy while decreasing the re- 
quired computation.In essence, we have devised a modification of the first 
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layer of a CNN into two branches, which optimizes the number of weights 
when dealing with a color encoding that separates achromatic from chromatic 
channels, such as LAB, HSL, etc. Although the proposed architecture does not 
increase the validation accuracy significantly, it points out that uncorrelating 
the input features eases the learning task of any CNN. As a future work in this 
line, we aim to find out other “correlations” in mid-level or high-level layers, 
hence we may be able to specialize the network neurons to different types of 
information. 
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State Support for Serverless Cloud Services 


Daniel Barcelona-Pons * 


Department of Computer Engineering and Mathematics, Universitat Rovira i Virgili 
Tarragona, Spain 
daniel.barcelona@urv.cat 


1 Serverless 


The cloud has been a hot topic in recent years for both, industry and 
academia. Captivated by the simplification, cost reduction, and other benefits 
of cloud services, many companies have moved their applications and services 
to managed infrastructure in the cloud. The most relevant offerings are pro- 
vided by Amazon, Microsoft, Google, and IBM. The advantage of this model is 
the increasing level of abstraction the user faces (IaaS, CaaS, PaaS). With the 
each time more reduced complexity, more sectors have available the cloud’s 
compute capacity. 

However, the cloud is still evolving. The latest development in this line 
is serverless. This new model establishes a philosophy where a cloud service 
user is totally oblivious to the servers. That way, they do not need to worry 
about provisioning and managing servers and can focus on their applications. 
Another important property of serverless is its billing model. In a serverless 
service, the user is only charged for the exact resources they use, at very fine 
granularity (in the order of milliseconds and few megabytes). This allows users 
to pay only for the performance required at any particular moment, against 
paying for machines working all day—thus,avoiding charges for idle resources. 

The serverless model importantly emerged in 2014, with AWS Lambda [1]. 
This service establishes the basis of what is the most popular compute offer- 
ing in the serverless world: Function as a Service (FaaS). Academia quickly 
started to analyze this model and examine its potential and limitations [2]. 
In this system, the user codes functions; small pieces of code with clearly de- 
fined functionality, some input parameters and some output. The functions 
are deployed directly to the cloud and are run as response to events, on de- 
mand. Additionally, functions execute in isolated environments (containers, 
e.g. Docker), and after running, the environment disappears. This makes each 
execution independent of the resources it is actually running on, allowing cre- 
ating a new environment with the function for each event or invocation, hence 
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making the service highly scalable. As a serverless service, it allows a user to 
quickly execute code in the cloud by just providing it and triggering it, with- 
out needing to configure or manage any of the underlying resources, which 
are managed transparently by the cloud provider. Following the appearance of 
Amazon’s service, other cloud providers like Microsoft, Google, and IBM pre- 
sented their counterparts: Azure functions, Google Cloud Functions, and IBM 
Cloud functions. Each of those have their own characteristics and limitations, 
but they all follow the same FaaS scheme. 

FaaS is the main component of serverless computing, and usually confused 
with serverless in general. However, it is just a way of offering computation 
by following the serverless philosophy. For instance, serverless can also be 
applied to storage. There are storage services in the cloud, like Amazon S3 
and DynamoDB, that also offer pay-per-use with zero server management. 


1.1 Current state and open challenges 


Serverless, and more specifically FaaS, is a very good fit for many kinds of 
applications, e.g. web services and backend. We have seen them used to serve 
websites, or as substrate for web applications, as well as for authorization 
procedures. We have also examples of functions being used for background 
processes, like resizing uploaded images, or scheduled maintenance jobs. And 
they can be used for more interactive tasks too, like real time translations. 

Nonetheless, it is a new model still in evolution and its offer is not complete 
yet (cloud providers still add new features and tune their services [8] and there 
is a lot of academic movement [4,6,7,5] exploring the limits of the model). 
This early state of the model makes many applications face several issues 
when trying to benefit from it. Such applications include data analytics in 
Big Data, machine learning (ML) training, algebra and graph computations. 
Implementing them is either not possible or highly inefficient with the current 
serverless offer, but however desirable for many data scientists to scale their 
computations with ease in the cloud. This is mainly due to two limitations in 
the FaaS model, consequence of their ephemeral nature (functions are expected 
to run a simple task and return quickly, hence they having a tiny time limit— 
up to 15 minutes). 

First, function executions are ephemeral: stateless and non-addressable; 
they cannot store any kind of state between function invocations, since each 
one could run on different environments; and function invocations cannot be 
reached by other processes since they do not expose any endpoint. This thwarts 
the logic capabilities of functions and the implementation of iterative tasks that 
need to work continuously with the same data (e.g., ML training over a data 
set). Developers are forced to use external services to store their data reliably 
and allow sharing some state between functions to scale a computation. Unfor- 
tunately, current cloud storage services are not prepared to deal with this kind 
of workload, where rapid access and bandwidth are very important. Serverless 
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storage services like object stores are too slow for fine-grained state sharing, 
and in-memory stores are not prepared for the scale FaaS is able to achieve. 
Second, FaaS services do not provide means to coordinate and synchronize 
function executions. This is related to the fact that functions are not address- 
able. We have seen projects that see FaaS as an infinite computation cluster, 
where each function is a compute unit (e.g., PyWren [6] and ExCamera [4]). 
For this model to work, functions need to coordinate. Having information and 
being able to react when each of the tasks finishes, know when to run new 
ones, and create operation workflows, is an important requirement for such 
distributed applications to be developed easily in a serverless way. Currently, 
this is not directly supported in FaaS services and it is required to use external 
controllers with certain limitations: functions need to be called synchronously, 
by blocking the client, and data and results must be obtained by polling exter- 
nal storage; Due to this lack of integrated support, coordination cannot benefit 
from internal events to make orchestrations more automatic, reactive and scal- 
able. Several orchestration services have appeared, like AWS Step Functions, 
IBM Composer, and Azure Durable Functions, but they do not fulfill the re- 
quirements for fully reactive workflow management [5]. Recently, an extended 
discussion about serverless and its challenges (including the ones discussed 
here) has been published [7]. 
The research presented in this document chases these challenges. 


2 Serverless stateful computation 


This section gives a quick description of the current work performed to 
solve these challenges and that will be part of the PhD thesis. 

As a first step towards enabling highly-concurrent stateful computation for 
FaaS services, we designed Crucial. Its simple programming model allows to 
port effortlessly multi-threaded algorithms to serverless. Crucial is built upon 
the key insight that FaaS resembles to concurrent programming at the scale of 
a data center. Then, to share state at fine granularity and enable coordination 
between functions, a distributed shared memory layer is the right answer. 
Figure 1 shows the general architecture of the system, where we see how FaaS 
functions are considered traditional threads (run from a client application) 
and they all have access to the same in-memory store to share state. The first 
results of this work have been published recently [3]. 


Acknowledgement. Daniel Barcelona-Pons’s work is financed by a Marti i Franqués 
Programme grant (URV). This work has been partially supported by the EU Horizon 
2020 programme under grant agreement No 825184 and by the Spanish Government 
through project TIN2016-77836-C2-1-R. 
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Client 
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A CloudThread n 
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1) Function invocations 2) Access to shared state 


Fig. 1: A client application runs a set of threads on the FaaS layer; all of them 
have access to the same shared state at the distributed shared object (DSO) 
layer. 
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1 Introduction 


The advancements of machine learning in the current world are taking place 
in our daily lives: from music suggestions made from our playback history to 
the automotive industry [1]. These technologies have access to individual’s 
data for learning models, often in privacy-invasive ways, which causes a break 
to these technological advances. In order for the progress of this technology to 
be built efficiently, it is necessary to consider a balance between the technolog- 
ical advances and the usage of individual’s data. An alternative approach to 
machine learning, called decentralized federated learning [2,3], is hereby con- 
sidered where data is not collected, i.e. it leaves the training data distributed on 
the mobile devices and learns a shared model by aggregating locally-computed 
updates. 

In federated learning, a centrally located server sends a model to user de- 
vices, also known as clients, which can be in the range from hundreds to mil- 
lions depending on the user base of the application. Each client downloads the 
current model, improves it by learning from data on their respective device, 
and then summarizes the changes as a small focused update. All clients send 
their updates to the central server, using encrypted communication, where it 
is averaged with other user updates to improve the shared model. The new 
shared model is sent back to the devices and this cycle iterates until the model 
is improved. Federated learning enables sharing information between a client 
and server without compromising on user privacy. 

The algorithms in federated learning may use a central server that orches- 
trates the different steps of the algorithm and acts as a reference clock, or they 
may be peer-to-peer, where no such central server exists. 
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1.1 Current research 


With the increasing usage of machine learning techniques that rely heavily 
on data, there is a need to ensure fair practices that do not propagate biases. 
Fairness matters because it has impact on everyone’s benefit. 

The data provided by the mobile devices can be highly biased. If the train- 
ing data coming from some minority group is much less than those coming 
from a majority group, it is less likely to model perfectly the minority group. 
To prevent clients from minority groups to contribute to the learning process, 
which therefore yields models built on the data on the non-minority clients, we 
need to discard updates that differ too much from the rest. On the other hand, 
malicious attacks aiming at poisoning the learned model are also likely to pro- 
vide updates that differ very much from the mainstream. For that, we propose 
two different solutions to distinguish poisoners from genuine outliers/minority 
group members. 


1. The first one is based on microaggregation [4], which attempts to create 
clusters such that the published attributes of the genuine outliers in each 
cluster are maximally similar. 

2. The second solution is based on Gaussian Mixtures. 


These two proposals were implemented using Python and evaluated on the 
Adult data set obtained from the UCI Machine Learning repository [7]. We 
created three different kind of participants in the Adult data set, where one 
of them were cheaters. By including diverse clients, we strive to help learning 
less discriminatory machine learning models. For the first solution, we used 
the maximum distance to average vector (MDAV) algorithm, with which we 
were able to differentiate the cheaters from the rest of the participants. In the 
second solution, the expectation-maximization (EM) algorithm [5,6] has been 
employed and is currently under work. 
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1 Introduction 


Cancer sets up a massive burden on societies throughout the world and 
prostate cancer (PCa) is one of the most noncutaneous cancer in men and 
the second leading cause of death worldwide [1]. In spite of its small size, PCa 
grows in the peripheral zone of the gland and can extend to a significant length 
before involving male genitourinary system [2]. 

On account of the elastic properties of soft tissues, doctors routinely use 
palpation to distinguish lesions within the prostate. Because, cancerous tissues 
are more rigid than normal tissues [3]. However, manual palpation is difficult 
for the inside lesions when the elasticity values are similar between the tissues. 
In addition, it does not provide any statistical or numerical information, and 
the effectiveness depends on the skill and experience of the surgeons. Therefore, 
ultrasound-based elasticity-imaging methods and other elastography methods 
were developed to overcome those limitations. 

Patient-specific modeling is very crucial for successful diagnoses of cancer 
and treatments because of the dependence on the patient’s material proper- 
ties of the tissues, geometry, etc. For example, Wang et al. [4] focused on 
patient-specific biomechanical parameters for inner and outer prostate tissue 
by employing shear wave elastography and proposed a deformation model for 
estimating patient-specific prostate deformation during image-guided inter- 
ventions. 

Currently, transrectal ultrasound (TRUS) is used in the routine clinical 
practice for targeting the prostate lesions because it is safe and affordable but 
specialists are facing some challenges when performing TRUS-guided biopsy. 
One of them is how to track the location of prostate lesions, given the pressure 
of TRUS at the rectum side (see fig. 1: Pressure Face) and also have the 
deformation of the prostate and surrounding areas. Second of them is how to 
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fit the appropriate parameters of rectum, bladder, pelvic bones and especially 
for prostate tissues for realistic deformation model. 


2 Methodology and Results 


Three-dimensional finite element (FE) pelvis organs are constructed to de- 
lineate the effect of different organ geometries as shown in Figure 1. The 
simulations are done in Code_ Aster [5] open source software to see the elastic 
biomechanical behavior of pelvic organs. The surfaces of rectum, bladder and 
pelvic bones are discretized with linear thin-triangular shell elements while 
3D four-node tetrahedral voluminal elements are used for the inner and outer 
prostate. Kinematic boundary conditions are examined for sacrum and hips 
in terms of node displacements i.e. (Uz = 0, Uy = 0 and U, = 0). As stated 
above, the transrectal ultrasound (TRUS) guided biopsy is used to target the 
prostate lesion by applying pressure on one side of rectum. The Doctor presses 
with a transducer to locate the prostate and the regions with injury. A similar 
condition this work, real properties are used for all tissues in the region (in- 
cluding injured tissue). Then a pressure similar to that practiced by the doctor 
in the biopsy is practiced on the part of the inside of the rectum (Figure 1: 
Pressure face). 


Hip 


Bladder 


Fig. 1: 3-D pelvis system, organs a, b and among others are soft tissues. 


Figure 2 shows the displacements obtained, the behavior of linear-elastic 
deformation after establishing the biomechanical parameters for each organ. 
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Fig. 2: (a) Frontal and lateral view (b) with displacements values. 


3 Conclusions 


The present methodology allowed us to greatly improve the location of 
prostate motion during TRUS procedure. Because we can estimate with great 
precision the final location of the lesions observed during the Computerized 
Axial Tomography. The correct location of the lesions will allow biopsies of 
damaged tissues, so as to avoid false negatives in patients. 


Acknowledgement. This work was partially supported by grant No. 713679 from the 
European Union’s Horizon 2020 Research and Innovation Programme under the Marie 
Sktodowska-Curie grant agreement and from Marti i Franqués COFUND Fellowship 
Programme from Universitat Rovira i Virgili. 


References 


[1] Lindsey A., Torre, Freddie Bray, Rebecca L., Siegel, Jacques Ferlay, Joannie 
Lortet-Tieulent, Ahmedin Jemal. Cancer statistics. CA Cancer J Clin., 65: 
87-108, 2015. 


Wein AJ, Kavoussi LR, Novick AC, Partin AW, Peters CA. Campbell-Walsh 
Urology. Elsevier Saunders, Philadelphia, 2012. 


[3] W. A. D. Anderson. Pathology. The C. V. Mosby Co, St. Louis, MO, 1953. 
[4 


[2 


—= 


= 


Yi Wang, Dong Ni, Jing Qin, Ming Xu, Xiaoyan Xie, Pheng-Ann Heng. Patient- 
specific deformation modelling via elastography: application to image-guided 
prostate interventions. Scientific Reports, 27386:6, 2016. 


[5 


= 


Aster. Electricité de France,Finite element code aster , Analysis of Structures 
and Thermodynamics for Studies and Research, 1989-2017. 1989-2017. 
https://www.code-aster.org/ 


A Glance Of The Performance Potential of 
Serverless Computing 


Ammar Mohammed Okran * 


Department of Computer Engineering and Mathematics, URV, Tarragona, Spain 
ammar.okran@urv.cat 


1 Introduction 


Cloud computing, also known as "Serverless Computing", has become 
widely popular during the recent years and it has already gained a large 
amount of users as they can access to a large pool of services (such as storage, 
computing power or applications) over the internet avoiding the complexity 
and cost of maintaining their own infrastructure. These resources are pro- 
vided on demand and the payment is made according to the amount of use. 
One of the advantages of cloud computing is that it facilitates the issue of 
the resources in many aspects as the users can store and access data online 
whenever they want and wherever they are, with no need to their computer’s 
hard drive. This means that data is stored in a remote place and it could be 
accessed from anywhere. Cloud service providers such as Google, Microsoft, 
Amazon or IBM are racing in order to provide a good service to the potential 
users. All of them are well aware that this guarantees to continue and compete 
in the market and attract a lot of customers. However, this can only be done 
through the simplicity, elasticity, and performance efficiency of services. 

To do this, we will perform several experiments using the services provided 
by the IBM cloud platform, executing an algorithm that works with LIDAR 
data and categorizing all the times involved from the beginning to the end 
of each experiment. The results obtained will give us a clearer vision of the 
potential performance of this platform and will help us identify possible areas 
for improvement. 


2 IBM Cloud 


IBM cloud is a range of cloud computing services provided by the IBM 
company and it includes platform as a service (PaaS), software as a service 
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(SaaS) and infrastructure as a service (IaaS). This allows users to deploy cho- 
sen services and client content, including client applications and data within 
the IBM cloud environment. At the present moment, IBM provides 18 avail- 
ability zones in six global regions across the world located in the United States 
(Washington DC and Dallas), UK, Germany, Australia and Japan. 

To perform the simulations, we have assumed PDAL library[2] and Pywren- 
ibm tool [4] that is an advanced extension of PyWren|3], an open source project 
that executes user’s Python code and its dependencies as serverless actions on a 
serverless platform. The new features that the Pywren-ibm tool provides range 
from automatic data discovering, data partitioning among others. However, it 
is still in a development stage and it has to face many challenges before it 
becomes completely stable. 

Particularly, the services that we have also assumed are IBM cloud func- 
tions (FaaS) and IBM cloud object storage (COS). On one hand, FaaS (Func- 
tions as a Service) is a subset of serverless computing service that allows to 
implement the application code in response to events without the need to 
know the complex details of the infrastructure, especially related to building 
and launching micro-services applications. On the other hand, COS is a storage 
service that can be managed to store unstructured data in a reliable manner, 
where is designed for high durability, flexibility and security. These services 
allow us to launch from 1 function (sequential execution) to many functions 
(parallel execution). 

Commonly, a parallel execution in the IBM Cloud system behaves as fol- 
lows: (1) the user uploads a file of a certain file together with the algorithm 
that processes this data from the local PC (Personal Computer) to the COS; 
(2) the system reads the data file from the COS; (3) the system partitions 
the data file in several chunks depending on the number of chunks that the 
user has specified; (4) the system uploads the splitted files into the COS and 
launches one Map() function per every created chunk ; (5) the system executes 
every function with its partial data; (6) the system waits until all the func- 
tions are completely executed; (7) the systems merges all the partial results 
in a single file using a Reducer() function; (8) finally, the user downloads the 
resulted file from the COS to the local PC. 


3 Classification of Times 


The total time needed to perform an experiment in the IBM cloud ranges 
from the moment that an user uploads the algorithm and its data to the mo- 
ment that the selected cloud server produce the final result. We have identified 
several partial times involved in the total time of any experiment. These times 
are the following: 


1 PC Uploading Time (A Time): the time needed to upload a file from 
the local PC of the user to the COS on the Cloud. 


A Glance Of The Performance Potential of Serverless Computing 23 


iw) 


Partitioning Phase Time (B Time): this category consists of three 
partial times: 


a) COS Downloading Time (B1): the period of time that the partitioning 
function reads the data file from C'OS in order to partition it into small 
chunks. 

b) Partitioning Time (B2): the period of time of splitting the file into 
many files. 

c) COS Uploading Time(B3): the period of time of uploading the splitted 
files after the partitioning process into the COS. 


3 Processing Phase Time (C Time): this category consists of four partial 
times: 
a) Mapping Time(C1): the time each function takes to read its data from 
COS. 
b) Processing Time(C2): the time of each function takes to process its 
data. 


c) Obtaining Results Time(C3): the period of time needed by the reducer 
to get all the results from all the executed functions. 
d) Reduction Time(C4): the time that the reducer takes to finish. 


4 Analysis of Results 


Our basic experiment will be performed in the IBM cloud system assuming 
an algorithm described in [1] that deals with a LIDAR data [5] file of 256 
Megabytes (a map of several square kilometers of the Tarragona area in Spain) 
to remove outlier points that are classified as a noise. The same work will be 
done several times but varying the number of functions that will process the 
original LIDAR data file and partitioning it into portions of the same size. 
Particularly, we have performed several simulations assuming from 1 function 
(with a file of 256 Megabytes) to 1024 functions (each function with a partial 
file of 256 Kilobytes). The results obtained in these simulations can be seen 
in Figure 1. In this way, for each of the time categories identified in Section 3, 
we can identify the following behaviors: 

Uploading Time (A): The Pywren-ibm partition approach only supports 
limited number of files types. In this case, .tat and .csv files. Therefore, we 
have forced to convert the .las file supported by LIDAR into a .csv file before 
uploading it to the COS. This causes an increase in the file uploading time (A) 
of approximately 3 times as the size of the converted file has been increased. 
As it is shown in figurel, the uploading of the data file implies a significant 
and similar time for all simulations except for the one that only assumes the 
execution of a single function. This is because with one function execution 
there is no need to make a partition of the file and therefore it is not needed 
to perform a format file conversion in the user’s local PC. 
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Fig. 1: Execution time depending on the number of functions 


Partitioning Time (B): The Pywren-ibm partitioner takes the file data 
and splits it in several chunks according to the size specified by the user. Then 
it computes the ranges of data and assigns one range per every function. Later, 
each function is responsible of requesting request its data according to the 
assigned range. Notice that this partitioning method do no need to download 
and then upload the files from the COS. Nevertheless, we can observe that the 
time of partitioning represents a small portion of the total time involved on 
the execution process. 

Processing Time (C): In this case, we can observe a decrease of the 
Mapping Time (C1) and the Processing Time (C2) as the number of functions 
launched are increased. This is due to the fact the size of data that each func- 
tion assumes is getting smaller and therefore reading and processing data need 
less time. On the other hand, Obtaining Results Time (C3) is increased when 
number of functions is also increased. Notice that every function needs to send 
the result to the reducer and therefore it can be a communication overhead 
when the number of requests are increased. Finally, Reduction Time (C4) is 
quite similar in all the simulations but we can observe that together with the 
Uploading Time (A) are the partial time that most contributes to total time. 
Definitively, that means that both times are targets for improvement. 


5 Conclusion and Future work 


In this work we have intended to provide a first approximation of the per- 
formance potential that a Serverless Computing System can achieve. Based on 
this preliminary results we identify a set of aspects of improvement that well 
explored can improve the overall performance of the system. This further anal- 
ysis will allow users and developers to understand how to take advantage of 
this approach. Therefore, the next steps in this research effort will be focused 
on analysing the following issues: 
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Type of Algorithm: We are currently working with LIDAR data and 
with an algorithm that removes the noise of a map. We plan to analyze 
other algorithms that work with LIDAR data and even with algorithms 
that works with other type of data (for instance, bioinformatic or geospatial 
data). 

Variability of Data: Here, the idea is to analyse how the variability of 
the data affects performance when assuming the same algorithm for all of 
them. 

Data Size vs Number of Functions: To perform an experiment in 
a cloud system, it is required to assume a file with a certain size and 
partition it to be processed by another certain number of functions. We 
aim to study both variables at the same time (data size vs number of 
functions). Therefore, files from a few megabytes to a large of gigabytes 
will be assumed and for each of them several simulations will be carried 
out varying from a single function to hundreds of them. 

Monetary Cost of Executions: In this case, we plan to compute the 
monetary cost for any potential experiment with a customize configuration 
and find out which is the optimum trade-off between monetary cost and 
performance. 

Type of Reduction: The Pywren-ibm tool supports two options for mak- 
ing the reduction of the functions: a serial and a parallel approach for all 
Map() functions. We plan to analyze both approaches in detail and propose 
modifications to both of them in order to achieve the minimum possible 
reduction time. 

IBM Cloud Servers: the simulations performed in this work are done 
in the IBM cloud site located in Washington DC. We want to analyse in 
detail all available IBM sites to identify at any moment and depending of 
the type of the problem, the best site to perform a simulation. 

Cloud Servers Providers: Similar to the previous analysis, we want 
to extend our simulations to other cloud service providers such as Google 
(Cloud Functions), Microsoft (Azure Functions) or Amazon (AWS Lambda) 
to see how the server provider impacts in the performance efficiency. 
Time of the Day: We plan to make several simulations in different times 
and days to see if it significantly affects the moment when an execution is 
performed. 

Type of Partitioning: At the present moment, we assume the default 
partitioning method that the system provides. We plan to explore other 
approaches as making our own partitioning method with a dedicated func- 
tion for this task or avoiding the physical partition of the file. That means 
that every function receives the full original file but each of them only 
processes a particular amount of data. 

Dynamic and Transparent Framework: based on the results of the 
previous analysis, we plan to build a system that eases the user from the 
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tasks of deciding the best cloud site at any moment, the type of parti- 
tioning, the optimal number of functions, the best time of day to run an 
execution, etc. In this way, based on the requirements of the user (mainly 
monetary cost and time to obtain the results), this new framework will try 
to find the optimum way to launch an execution. 
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1 Introduction 


In this work, aiming to become a PhD Thesis, we are extending a partially 
proved conjectured for a special kind of numberical semigroups, ordinary semi- 
groups, to an even more special kind, the quasi-ordinary semigroups. The main 
idea is approaching this problem paralleling the mentioned families of semi- 
groups by using both analytical procedures and the construction of different 
trees and forests whose nodes are numerical semigroups. 


2 Numerical semigroups 
Let No denote the set of non-negative integers. 


Definition 1. A numerical semigroup is a subset of No which is closed under 
addition, contains 0, and its complement in No is finite. 


Example 1. The amount of money one can get from a cash point when the 
machine only withdrawls nothes of 20 and 50 (divided by 10): 


S = {0,2,4,5,6,7,8, 3} 


Example 2. In a popular hamburger shop, you can only buy packs of chicken 
nuggets with either 4, 5 or 9 pieces inside. What different numbers of chicken 
nuggets can you obtain? 


S = {0,4,5,8,9, 10,12, >} 


Example 3. Numerical semigroups model Weierstrass non-gaps, an approach 
to algebraic geometry and AG codes. 
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Example 4. They appear in music theory, as they are the inherent structure 
of the set of numbers of semitones of the intervals of each overtone of a given 
fundamental tone with respect to the fundamental tone, when the physical 
model of the harmonic series is discretized into an equal temperament. 


Definition 2. Let S be a numerical semigroup. 


. The elements in No \ S are called gaps. 

. The number of gaps is the genus of S. It will be denoted as g. 

. The largest gap is called the Frobenius number. Jt will be denoted as F. 

. The non-gap right after the Frobenius number is called the conductor. It 
will be denoted as c. 

5. The multiplicity is its first non-zero non-gap. It will be denoted as m. 

6. The generators of a numerical semigroup are those non-zero non-gaps 

which cannot be obtained as the sum of two smaller non-gaps. 


NMG wh 


Remark 1. Clearly, c= F+1 


Example 5. Let S be the semigroup seen in Example 2. 


1. Its gaps are the elements of the set {1,2,3,6, 7,11} 

- 9(S) =6 

eS )yy = 11 

2009) S12 

.m(S)=4 

. The generators of S are the elements of the set {4,5,9}. We will write it 
this way: 


HD oF KR W DD 


S =< 4,5,9 > 


Definition 3. A numerical semigroup other than No is said to be ordinary if 
its gaps are all in a row. 


Example 6. Let S =< 3,4,5 > given by its generators. The semigroup it would 
be S = {0,3, >}, wihch clearly is ordinary as its gaps, {1,2,3}, are in a row. 


Definition 4. A numerical semigroup is said to be quasi-ordinary if m = g, 
that is to say, there is only one gap larger than m. 


Definition 5. Let S be a semigroup with Frobenius number F. 


1. The sub-Frobenius number of S is the Frobenius number of SU {F}. 
2. The subconductor of S is the smallest nongap in its interval of nongaps 
immediatelly previous to F. 
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3 Ordinarization transform and the ordinarization number of 
a numerical semigroup 


Definition 6. The ordinarization transform of a non-ordinary semigroup S 
with Frobenius number F and multiplicity m is the set S’ = S\ {m}U {F}. 


We can iterate this process until we obtain an ordinary semigroup {0,9 + 
1,g+2,—}, where g is the genus of S, iterating 2 times. This number is called 
the ordinarization number of S. 


Example 7. Let S =< 2,5 >= {0,2,4,5, >}. We can easily see that m = 2 and 
F = 3. Then, S’ = {0,3,4,5, >}, which is ordinary. Besides, its ordinarization 
number is 1. 


3.1 Conjecture 1 


Let ng, be the number of numerical semigroups of genus g and ordinar- 
ization number r. For each genus g < 49 it has been computed in [1] ng, for 
each ordinarization number r from 0 up to [$]. One can observe that for each 
ordinarization number r, the number of semigroups for this r increases with 
the genus or stays the same. By extending the definition of ng, for r > [$] 
by setting ng = 0 in this case, this leads to the next conjecture: 


Conjecture 1. For each genus g € No and each ordinarization number r € No, 


Ngr S Ng+1,r 


Remark 2. Conjecture 1 has been partially proved in [1] 


4 Quasi-ordinarization transform and the quasi-ordinarization 
number of a numerical semigroup 


Definition 7. The quasi-ordinarization transform of a non-ordinary semi- 
group S with multiplicity m, genus g and sub-frobenius number f is the set 


S’= SU{f}\ {m}. 


The quasi-ordinarization transform of a non-ordinary semigroup of genus g 
and conductor c can be applied subsequently and at some step we will attain 
the quasi-ordinary semigroup of that genus and conductor, that is, the numer- 
ical semigroup {0,g,g+1,...,c—2,c,c+1,—}. The number of such steps is 
defined to be the quasi-ordinarization number of S. 
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Example 8. Let S =< 2,11 >= {0,2,4,6,8,10,11,}. We can easily see 
that m = 2 and f = 7. Then, S’ = {0,4,6,7,8,10,11,-}, which is not 
quasi-ordinary yet. In this case, m = 4 and f = 5. Therefore, S” = 
{0,5,6, 7,8, 10,11,—}. S” is quasi-ordinary, and its quasi-ordinarization num- 
ber is 2. 


Remark 8. 1. The quast-ordinarization of either an ordinary or quasi-ordinary 
semigroup is defined to be itself. 

2. In the ordinarization and quasi-ordinarization transform we replace the 
multiplicity by the largest and second largest gap, respectively, and we 
obtain numerical semigroups. In general, if we replace the multiplicity by 
the third largest gap, we do not obtain a numerical semigroup. 


We denote by ngq the set of numerical semigroups with genus g and quasi- 
ordinarization number q. 


4.1 Conjecture 2 


We can observe a behavior of ngq very similar to the behavior of ng), 
introduced in [1]. The purpose of this work is paralleling the Conjecture 1, 
that is, we conjecture that for each genus g € No and each quasi-ordinarization 
number q € No, 

Ngq S Ng+14- 


5 Conclusion 


What has to be proved exceeds by far the length of this extended abstract. 
By fixing a genus g, we can define a graph whose nodes are all semigroups of 
such genus and whose edges connect each semigroup to its quasi-ordinarization 
transform, supposing it’s not itself. The graph is a forest F, rooted at all 
ordinary and quasi-ordinary semigroups of genus g. In particular, the quasi- 
ordinarization transform defines, for each fixed genus and conductor, a tree 
rooted at the quasi-ordinary semigroup for that genus and conductor. This 
forest will lead us to proof Conjecture 4.1. 
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1 Introduction 


The following approach to the protection of a graph was proposed by Cock- 
ayne et al. [5]. Suppose that one or more entities are stationed at some of the 
vertices of a graph G and that an entity at a vertex can deal with a problem 
at any vertex in its closed neighbourhood. In general, an entity could consist 
of an observer, a robot, a guard, a legion, and so on. 

Informally, we say that a graph G is protected under a given placement 
of entities if there exists at least one entity available to handle a problem at 
any vertex. Various strategies have been considered, under each of which the 
graph is deemed protected. As we can expect, the minimum number of entities 
required for protection under each strategy is of interest. 

The simplest strategies of graph protection are related with the concepts of 
domination and total domination of graphs. In such cases, the sets of vertices 
containing the entities are dominating sets, total dominating sets or their 
variants, respectively. Next, we expose some of these strategies. Typically, 
a vertex in a graph G = (V(G), E(G)) dominates itself and its neighbouring 
vertices. A subset S' C V(G) is said to be a dominating set of G if S dominates 
every vertex of G, while S is said to be a double dominating set of G if S 
dominates every vertex of G at least twice. A subset S C V(G) is said to be 
a total dominating set of G if every vertex v € V(G) is dominated by at least 
one vertex in S \ {v}. The minimum cardinality among all dominating sets of 
G is the domination number, denoted by y(G). The double domination number 
and the total domination number of G are defined by analogy, and are denoted 
by yx2(G) and %(G), respectively. Likewise, a set S C V(G) of vertices of a 
graph G is a 2-dominating set if every vertex of V(G) \ S has at least two 
neighbours in S. The 2-domination number of G is the minimum cardinality 
among all 2-dominating sets of G, and it is denoted by 7(G). 
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A total dominating set S' is said to be a secure total dominating set of G 
if for each vertex u € V(G) \ S, there exists a vertex v € S adjacent to u 
such that (S \ {v}) U {u} is a total dominating set of G as well. The secure 
total domination number of G, denoted by ys:(G), is the minimum cardinality 
among all secure total dominating sets of G. This last concept was introduced 
by Benecke et al. in [1] and also studied in [2,3,4,6,7]. 

The problem of computing ys+(G) is NP-hard [6], even when restricted to 
chordal bipartite graphs, planar bipartite graphs with arbitrary large girth and 
maximum degree three, split graphs and graphs of separability at most two. 
This quotation suggests finding the secure total domination number for some 
special classes of graphs or obtaining tight bounds on this parameter. This is 
precisely the aim of this work in which we study the case of rooted product 
graphs. 


1.1 Notation 


All graphs considered in this work are finite and undirected, without loops 
or multiple edges. Given a vertex v of G, N(v) will denote the open neighbour- 
hood of v in G. The closed neighbourhood, denoted by Nv], equals N(v) U{v}. 
Given a set S C V(G), its open neighbourhood is the set N(S'}) = Uses N(v), 
and its closed neighbourhood is the set N[S] = N(S)US. 

We say that a vertex v € V(G) is a universal vertex if N[v] = V(G). A leaf 
of G is a vertex of degree one. A support verter of G is a vertex adjacent to a 
leaf, a strong support vertex is a support vertex which is adjacent to at least 
two leaves and a strong leaf is a leaf which is adjacent to a strong support 
vertex. The set of leaves, support vertices and strong leaves are denoted by 
L(G), S(G) and L(G), respectively. If there exists a leaf uv € L(G) \ L(G), 
then we say that v is a weak leaf, and the set of weak leaves is denoted by 
Lw(@) = L(G) \ Ls(Q). 


2 Some results on the secure total domination number of 
rooted product graphs 


In this section we present some results that allow us to analyse the secure 
total domination number of rooted product graphs. Firstly, we introduce the 
definition of this specific product graph. 

Given a graph G of order n and a graph H with root vertex v, the rooted 
product G o, H is defined as the graph obtained from G and H by taking 
one copy of G and n copies of H and identifying the i” vertex of G with the 
vertex v in the i” copy of H for every i € {1,2,...,n}. If H or G is the empty 
graph, then Go, H is equal to G or H, respectively. In this sense, to obtain 
the rooted product Go, H, hereafter we will only consider graphs G and H of 
orders greater than or equal to two. 
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Theorem 1. For any graphs G and H with no isolated vertex and any vertex 
ve V(A), 

Yst(G Oy H) < n(G)yst(H). 
Furthermore, if v € V(H) \ S(H), then 


Yst(G Oy H) < Yst(G) + n(G)yse(H — {v}). 


The following result states the intervals in which the secure total domina- 
tion number of a rooted product graph can be found. 


Theorem 2. Let G and H be two graphs with no isolated vertex. At least one 
of the following statements holds for every verter v € V(H) \ Ly (HA). 


(i) Yst(G Oy H)= = n(G)Yse(H). 

(ii) n(G) (yse(H) — 1) S Yst(G ov 1) < Yse(G) + n(G)(Yse(Z) — 1). 

(iii) 2y2(G@) + n(G@)(yse(H) — 2) < Yst(G ov H) < Yse(G) + n(G) (Yee) — 2). 
(iv) ¥x2(G) + n(G)(Yst(Z) — 2) < Yst(G ov H) < Yse(G@) + n(G)(yse(H) — 2). 


The following result is an immediate consequence of theorem above. 


Theorem 3. Let G and H be two graphs with no isolated vertex and v € 
V(A) \ Lw (A). If yst(H — {v}) = Yst(H) — 2 and yst(G) = ¥x2(G), then 


Yst(G ov H) = Yst(G) + m(G)(Yst(H) — 2). 


We now consider some particular cases in which we impose some additional 
restrictions on H. We begin with the cases in which v € S(H) UL,(#). 


Theorem 4. Let G and H be two graphs with no isolated vertex. The following 
statements hold. 


(i) Ifve S(H), then Yst(G Oy H) = n(G')yst(H). 
(ii) [fv € Ls(A), then yst(G oy H) = y(G) + n(G)(yse(H) — 1). 


Finally, we analyse the secure total domination number of rooted product 
graph G o, H for the case in which the root v is a universal vertex of H. 


Theorem 5. Jf G is a graph with no isolated vertex, then for every graph H 
having a universal verter v € V(#), 


Yst(G Oy Hf) = n(G)ys¢(). 


The corona graph G © G’ is a rooted product graph Go, H where H 
Kk, +G’ and v is the vertex of Ky. 


IIe 


Theorem 6. Jf G is a graph with no isolated vertex, then for every graph G’, 
yst(G © G’) = n(G)(7(G’) +1). 
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1 Abstract 


The objective of this paper is to describe an approach to detect the slip 
and contact force in real-time feedback. In this novel approach DAVIS camera 
used as a vision tactile sensor due to its fast process speed and high resolution. 
Two hundred experiments were performed on four objects with different shape, 
size, weight and material to compare the accuracy and respond of the Baxter 
robot grippers to avoid slipping. The advanced approach is validated by using 
a force-sensitive resistor (FSR402). The events captured with DAVIS camera 
are processed with specific algorithms to provide feedback to the Baxter robot 
aiding it to detect the slip. 


Keywords: vision sensor; tactile sensor; baxter robot; real-time feedback; vision- 
based slip detection 


2 Introduction 


Though the sensors are good in detecting slip of the objects they will not be 
able to apply the minimum force required so that the object does not deform. This 
will be destructive in the case of fragile objects. The slip is detected only when a 
shear force is produced and the object slips away. The idea of using a camera to 
detect the slip will overcome these disadvantages. The input from the camera can 
also be used in conjunction with deep learning algorithms to benefit from incipient 
slip detection. The Baxter robot is an industrial robot built by rethink robotics [2]. It 
has two arms with an animated face. Since the robot is mainly used in assembly and 
manufacturing lines, a safety feature has been built into the robot which is capable 
of sensing humans near-by and restricting its motion. Thus it eliminates the need 
to be in a caged working environment unlike other robots. The Baxter achieves this 
by having multiple sensors in its hand and arms, while the motors actuate the joints 
through springs. The springs provide a cushioning effect and make it less hazardous as 
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it is able to reduce the force before collision. The Baxter robot is capable of learning 
complicated movements in a unique way. The robot’s hand can be moved physically 
by a person in the desired way and the robot is capable of memorizing the motion and 
replicating it. There is a single mounted camera on Baxter last joint which can be used 
for teaching Baxter to do tasks [1]. For this project, a transparent gripper needs to get 
the events of the contacted area, therefore, a new gripper designed with acrylic and 
transparent silicone. The DAVIS camera is a unique type of camera, the method in 
which it acquires inputs sets it apart from the traditional cameras. It is inspired by the 
function of the human eye and tries to mimic it. Knowing how this approach benefits 
functioning for eyes, researchers are developing machine-vision frameworks in which 
every pixel modifies its own inspecting because of changes in the measure of incident 
light it gets. This method of capturing data is called level-crossing sampling. Each 
pixel is attached to a level-crossing detector and a separate exposure-measurement 
circuit. The amplitudes of each pixel are measured and checked in a reference to a 
previously measured threshold value or compared with the previous signal, if below 
the previous signal the new level is recorded and used as reference. In addition, the 
DAVIS camera can be used to detect the deformation of the object during grasping 
by the Baxter grippers [4]. This camera has a high-speed computer vision because it 
works mostly with event-based, which has low latency, high temporal resolution, and 
very high dynamic range [3]. 


3 Experimental Setup 


Since the event camera only gets pixels that change they have advantages over 
traditional cameras. They do not process redundant information (only captures pixels 
that change) hence they save a lot of memory, do not waste computational energy 
on redundant pixels, do not suffer from latency as they do not have to wait for 
unnecessary frames and has a very high temporal resolution, very high dynamic range 
and low power and bandwidth requirements. DAVIS240C camera has been used for 
detecting the slip of the object in a millisecond and then sending the feedback of the 
DAVIS camera to the gripper of the Baxter robot. DAVIS camera has been mounted 
on behind the grippers with a specific angle, which based on the brightness of the 
light, will detect the touching area of the object and gripper. Therefore, gripper must 
be transparent so the camera can see the object through the gripper, and gripper 
must be made of soft tissue materials so that by deformation of the gripper camera 
will detect whether or not slip is happen. Transparency and thickness of the silicone 
grippers are quite important in this task because much transparency can cause a lot of 
noise in the background of the object. If the thickness of the silicone is big, the contact 
area will not be clear in the DAVIS camera and if the thickness is too thin, then it 
will not capture the deformation the silicone and object can slip easily through the 
grippers. To prove that force is changing in Baxter grippers, force sensitive resistor 
has been set and coded in Raspberry PI 2B to validate the value of the force. The 
FSR402 is a Force Sensitive Resistor which is a robust polymer thick film (PTF) 
device which shows a decrease in resistance by increasing the force applied to the 
sensor surface. FSR has been developed to use in electronic devices, which needs to be 
controlled by human touch such as automotive electronics, medical systems. FSR402 
will show the force in a range of 0 to 1024, although this number can be changed 
based on a resistor that has been set for this sensor. This resistance depends on the 
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amount of pressure that will be applied on the sensor which is more force will cost 
less resistance and less force will make high resistance and if there is no pressure on 
the sensing area, the resistance will be bigger than 1M. This sensor can sense applied 
force anywhere in the range of 100g-10kg. Robot Operation System, known as ROS, is 
an open-source mixture of middle ware and operating systems for robots. This meta- 
operating system provides the services that expect from a robot operating system 
such as hardware abstraction, low-level device control, implementation of commonly 
used functionality, message passing between processes, and package management. 
One of the most advantages of ROS is that it can provide a huge amount of tools 
for configuring the setup for the robot, start programming, testing and analyzing the 
results debugging the problems and many more possibilities. In addition, ROS has 
the availability to use many libraries that implement useful robot functionality, with 
a focus on mobility, manipulation, and perception. 


Fig. 1: Assembly Setup on Baxter Gripper 


Analogue image processing will analyze and manipulate the hard copy of the 
image. However, digital image processing will analyze and manipulate the image in a 
computer with image processing software. The noise filter is used to filter out noises 
in the background of the frame that is focused upon. The noise filter works by using 
the concept of hot pixels. Hot pixels can be considered as an abnormal behavior 
of pixels. An object with a specific size would produce at least more than two or 
three pixels depending upon the distance. If the object is too far away it is entirely 
a different story. Any change of an object very far away would be represented as 
a single pixel. For learning which pixels are hot pixels or not, the two parameters, 
which is explained above, is taken as an input form the user. The time period in 
microseconds to accumulate events for learning hot pixels and the number of events 
needed in a learning time period to consider a pixel hot. The contact area is the area 
of the object that is touching the silicone in the gripper. Finding out the contact area 
helps to determine the slip of the object efficiently. The contact area is determined by 
capturing the events from the camera when the gripper is closing in on the objects. 
Objects coming in contact with silicone will produce negative polarity events at the 
point of contact because they block the incoming light to the camera. Based on the 
results from image processing, if the camera detects the slip happening, then, the 
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gripper position will be reduced to prevent slip. This was the feedback between the 
camera and the Baxter robot and is done with socket programming. However, the 
important part is gripper must know how much needed to be closed or opened to 
stop slipping. Because if the gripper position is being reduced too much, then it will 
deform or even break the object. For finding the best values for the gripper position 
in each period, the PID controller has been programmed for this task, which will be 
explained in the next section. 


4 PID Control 


The PID controller is employed in this research as a means of feedback and to 
control the force of the grippers. The gripper position parameter is used as a means 
of controlling the force. The PID controller works by receiving an error, which is 
a deviation from a specified set point and adjusts the control outputs in order to 
drive the error to zero. The three values namely, proportional, integral and derivative 
determine how quickly and accurately the error is driven to zero. The error in this 
setup would the number of positive polarity events captured in the contact area while 
the program is running. Also, the error has to be updated after each timeframe, called 
the sample time. The sample time chosen for this setup is one millisecond. 


5 Results 


For each object, two parameters of the gripper position and force sensor value have 
been discussed. The first graph represents the gripper positions of the gripper and the 
second graph represents the force sensor values. Based on the 200 experiments, cube 
shape objects have more accurate results than spherical objects because the contact 
area is bigger. Because of the vibration of robot arms, there are a little fluctuation 
in force sensor values. 


18.5 


“0 20 40 60 80 100 120 140 0 100 200 300 400 500 


Fig. 2: Gripper position and Force sensor values for plastic box 110 gr 


Sometimes there is a difference between number of force values and gripper posi- 
tion values, because the times, which slip is not happening, the results will be same 
and gripper will not count the values, but force sensor will capture all values till one 
cycle period time will be down, which in this experiment all cycle were 20 seconds. 
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Fig. 3: Gripper position and Force sensor values for plastic box 1360 gr 


6 Conclusion 


In this paper, a novel method approach was proposed to detect the slip and contact 
force in real-time feedback. DAVIS camera was used as a vision tactile sensor for fast. 
process and high resolution. Socket programming was also used to get close-loop in 
real-time feedback. The advanced approach is validated by using a force-sensitive 
resistor (FSR402). The events captured with the DAVIS camera were processed with 
algorithms to provide feedback to the Baxter robot aiding it to detect slip. All devices 
are synchronized to avoid any delays. PID controller was used for stabilizing the 
system to get more accurate results. 
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1 Introduction 


Piping and Instrumentation Diagrams (P&IDs) are frequently used for the 
representation of the structure and components in Oil & Gas industry. Last 
years, there has been an increased effort to migrate these printed engineering 
drawings towards a digital environment throw collaborations between indus- 
trial partners and the research community”. This has been tested by numerous 
authors [3],[10],[6],[7],[8],[4], and some methods have appeared which automat- 
ically generate a CAD document given the paper sheets. Due to the complexity 
of this process and the need of being a zero error process, this final CAD doc- 
ument is always verified by an engineer. 

This paper is an abstract of article [9] in which we present a method that 
reduces the required engineer working time throw the automatic detection of 
possible incorrectly identified components in the generated CAD model. 


2 CAD model generation and automatic P&ID validation 


Figure 1 shows the general schema of the method we propose in [9]. We 
explain below the main characteristics: 

Given a Paper P& JID, the generation of an Automatic CAD model repre- 
senting its information requires two main steps. Firstly, the digitisation step 
consists on detect and localise the main shapes of the P&ID and create a netlist 
containing the positions of each element. Secondly, the contextualisation step 
makes sense out of the previously digitised data describing the connections 
and relations between the components. 

After the Digitisation & Contextualization step, the Automatic CAD needs 
to be validated to guarantee the correctness of the obtained components and 
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Fig. 1: Main schema of the method presented in [9] to assist the human vali- 
dation of P&IDs. 


structures. Then the Automatic Validation module detects the possibly incor- 
rectly identified components in the Automatic CAD. 

This module is implemented defining an attributed graph representing the 
P&ID. Attributed graphs are commonly used in pattern recognition processes 
and they are a powerful tool (see reviews [1,11,5,2]). In our situation, the nodes 
represent the components and the edges represent the pipelines connecting 
them. Morover, nodes have only one attribute which is the component identity 
and edges are undirected and unattributed. The graph is embedded in an 
Euclidean space and then a neural network is applied. Then, a verification 
module analyses the results and proposes the possible incorrectly identified 
components in a Highlighted CAD. The human expert just need to validate 
these elements and the property of zero error process is maintained with a 
considerable reduction of work. 


3 Conclusions and future work 


We have developed a method to reduce the effort in the human validation of 
Automatic CAD models representing P&IDs. The experimental results show 
that we achieve a reduction of approximately the 40% of human effort keeping 
a zero-error process. 

As a future work, we want to analyse different configurations of the defined 
neural network and analyse if a deep neural network could be used. We also 
want to apply the method to P&IDs of other industries and we believe our 
method could really decrease the economical and temporal cost of this task. 
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1 Introduction 


Diabetes Mellitus (DMs) is a chronic disease that affects 382 million pa- 
tients worldwide (2013 data) and is predicted to increase to as many as 592 
million adults by 2035. In Spain, we predict an incidence of over 3 million 
DM sufferers by 2030 [3]. People suffering from DM may have different de- 
rived health complications, in particular, we focus on studying the Diabetic 
Retinopathy (DR) disease. Diabetic Retinopathy causes lesions in the eye, in 
particular, in the retina. If they are not detected and treated in due time, it 
can cause blindness. DR is one of the significant causes of blindness in young 
adults around the world [1]. 

A screening test of the eye fundus by means of images obtained with a 
non-mydriatic fundus camera can reveal the presence of lesions in the eye. In 
Catalonia (as well as Spain and many European countries), annual screening is 
recommended for diabetic patients. However, this treatment is costly and not 
feasible. Due to a large number of patients, at the moment, Catalan centers 
are only able to screen each patient on average every 2.5 years. Given the 
incidence of the disease is about 10%, most of the tests are unnecessary. Family 
doctors, who are in charge of continuous monitoring of diabetic people, are not 
experts in the diagnosis of DR. For this reason, they are not able to decide 
not performing a screening analysis on a particular patient. Not performing 
proper filtering causes an excessive increase in the cost of screening. Moreover, 
patiently attend unnecessary tests, with a loss of time for the patients as well 
as for the experts that must analyze thousands of eye images collected [2]. 

In the last years, we have been working on the development of an intelligent 
decision support system that estimates the risk of developing DR using the 
data available in the Electronic Health Care record of diabetic patients. De- 
pending on the risk level, the system recommends the most appropriate time 
for the next visit. 
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In a study of the data available in EHR, the most significant risk factors 
were identified [4]. They correspond both to physical conditions (e.g. age or 
sex), to substances detected in blood or urine analysis (e.g. microalbuminuria) 
and to the variables related to the patients’ treatment (e.g. control of arte- 
rial hypertension or ingestion of oral anti-diabetic medication). Finally, nine 
factors were selected. 

The goal is that family physicians have a tool integrated with their usual 
software at the hospital that can be executed in order to calculate the degree 
of DR risk. 


2 A decision support system based on fuzzy rules 


This clinical decision support system consists of a set of fuzzy rules. In 
particular, we have a model that is a Fuzzy Random Forest, consisting of a 
set of 100 trees. 

Machine learning algorithms are used to construct the fuzzy rules of each of 
the trees. In particular, we use a modified version of the Yuan&Shaw algorithm 
to minimize a measure of classification ambiguity based on fuzzy sets. The in- 
ference method applied to the ruleset is the Mamdami procedure combined 
with an aggregation operator based on Choquet integral [5]. The classifica- 
tion model build achieves an accuracy of 80%, with sensitivity of 81.3% and 
specificity of 79.7%. 


3 Retiprogram at Hospital Universitari Sant Joan de Reus 


Hospital Universitari Sant Joan de Reus (HUSJR) is the leading health 
center located in Reus (Tarragona province). This hospital serves an area of 
Catalonia with a population of 247174 inhabitants. The total number of dia- 
betic patients (with type-2 DM) screened yearly has increased from 5000 to 
5500 in the last 10 years, from a total of 15811 T2DM patients. 

In cooperation with the computer engineers working at HUSJR, the Retipro- 
gram is now included in the toolset of the desktop software of the doctors. 
Retiprogram also works as a web service, and it has been installed in a server 
of the hospital. A graphical interface has been developed following the rest of 
the software of the hospital, which helps doctors to be familiar with it and ef- 
ficiently work with this new system. The frame is as simple as possible, having 
the input data in the left of the window and the output data at the right, as 
shown in Figure 1. 

Most of the 9 input values are automatically retrieved from the EHR that is 
stored in the hospital database. If their values are missing or must be updated, 
the doctor introduces the new values, and the system performs the calculation. 
This tool shows if there is a risk or not and indicates the degree of confidence 
in the answer. 
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Fig. 1: Retiprogram at HUSJR. 


As the system is not completely perfect in its predictions, the doctor must 
confirm or not the answer obtained. For this reason, the user can introduce 
comments in order to justify the decision made, especially in case of not fol- 
lowing the recommendation given by the system. 


4 Future Work 


We are collecting now the feedback from the doctors in the use of the 
Retiprogram system. After a period of test, we will analyze the performance 
of the system and the comments given by the doctors in the cases where the 
model gives an unreasonable answer. We are also working on the explainability 
of the output given by the classification system, which will show to the doctor, 
which are the risk factors that should be corrected. 
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1 Introduction 


Several researchers consider that students’ attitude towards mathematics 
and teachers’ efficacy are an important factor in high-quality mathematics, 
since it is related to students’ academic performance (e.g., [1], [2], [3]). 

According to some researchers, a teacher must have positive attitudes and 
beliefs towards mathematics in order to achieve a successful teaching activity 
(e.g., [4]). Specifically, these researchers mentioned that the attitudes and be- 
liefs of trainee teachers would influence their teaching in the future and also 
the attitudes and beliefs of their students. 

The following research works study the teachers’ efficacy beliefs and atti- 
tude towards mathematics. In [5], they studied the relationship between the 
mathematical anxiety of elementary school teachers, math teaching practices, 
and students’ performance in mathematics. They showed that low-level be- 
liefs in self-efficacy to teach math could cause mathematical anxiety, which, 
at the same time, can negatively influence student performance. In addition, 
they found a positive relationship between math anxiety and anxiety about 
teaching mathematics. Moreover, they found that the increase in student per- 
formance in mathematics was related to lower levels of anxiety from teaching 
mathematics, but was not related to general anxiety about mathematics. 

More recently, in [6], they investigated the characteristics of primary school 
teachers, comparing their beliefs of self-efficacy in teaching mathematics. The 
author stated that teachers with a greater belief in self-efficacy show a higher 
level of effort and persistence with students, being more open to new ideas 
and new methods. In addition, these teachers believe in students’ academic 
achievements and take responsibility for student success. 

Besides, in [7], they examined the relationships between the efficacy of 
primary math teachers with the self-efficacy and mathematical performance of 
their students. Their results showed that math teachers’ beliefs of efficacy have 
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a significant influence on the self-efficacy and performance of their students in 
mathematics. 

Taking into account these previous works, the objective of this research is 
to study the relationship between the teachers’ efficacy beliefs and the attitude 
towards mathematics of pre-service teachers. 


2 Method 


The participants in this study are the third-year students of the Bachelor 
Degree in Primary Education at University Rovira i Virgili who attended math 
class the second week in the 2019-2020 academic year. Specifically, the sample 
of this study corresponds to 56 students out of the 95 enrolled in the third 
year of the Degree of Primary Education (59% of the total population), 28% 
of the sample were men, while 72% were women. 

Two instruments were used in this research: 1) the Mathematics Teach- 
ing Efficacy Belief Instrument (MTEBI) for pre-service teachers [8], and 
2) the Attitude towards Mathematics Scale (AMS) for pre-service teachers 
[9]. The MTEBI consists of 21 items and it provides information on its two 
subscales, teaching self-efficacy (PMTE) and Mathematics Teaching Outcome 
Expectancy (MTOE). The AMS consists of 25 items and it has five subscales 
(pleasure, anxiety, usefulness, motivation and confidence). 

In order to determine the reliability of the obtained results, internal consis- 
tency was analyzed. It is important to indicate Cronbach’s alpha for attitude 
towards mathematics is excellent (a@ = 0.93) and for teachers’ efficacy beliefs 
it is acceptable (a =0.75). 


3 Results 


This research explores the relationship between mathematics teachers’ effi- 
cacy beliefs and attitude towards mathematics of primary pre-service teachers. 
Specifically, two research questions are considered in this study: 1. Is there a 
relationship between the mathematics teachers’ efficacy beliefs of pre-service 
teachers and their attitude towards mathematics? 2. Are there differences be- 
tween the mean the MTEBI and AMS? 

Question 1, Pearson’s correlation coefficient between the teachers’ efficacy 
beliefs and attitude towards mathematics is (p=0.54,p<0.01). This result 
indicates that there is a significant moderate correlation between beliefs and 
attitude. 

For a better interpretation of the correlation of the data, the scatter plot 
between the teachers’ efficacy beliefs and attitude towards mathematics is 
presented. Figure 1 shows that there is a direct linear relationship between the 
beliefs and the attitude of pre-service teachers with a stochastic dependence. 
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The straight line (red line) indicates the trend line. Besides, the lowess line 
(blue line) indicates the local regression, it is an adjustment of data curves by 
smoothing. 


Attitude towards mathematics 


Teachers’ efficacy beliefs 


Fig. 1: Scatter plot between beliefs and attitude 


Question 2, Figure 2 shows the mean of the values that each pre-service 
teacher gives to the teachers’ efficacy beliefs and attitude towards mathemat- 
ics. It can be seen that the lowest values are obtained in attitude towards 
mathematics. 
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Fig. 2: Individual pre-teacher average scores given to the beliefs and attitude 


Notice that the mean of the beliefs is 3.01 and standard deviation is 0.27, 
while the mean of the attitude is 2.80 and standard deviation is 0.68. Besides, 
the range of values of the attitude is clearly wider than the ones of the beliefs. 

Additionally, a t-test was performed to compare the means obtained in each 
test. Specifically, a two-tailed test was administrated, setting a = 0.05 asa 
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significance level (5%). The value of p is (p<0.001), the difference between 
the obtained means is statistically significant beliefs and attitude. 


4 Conclusions 


The purpose of this research was to study relationship between mathemat- 
ics teachers’ efficacy beliefs and attitude towards mathematics. To carry out 
the research, two independent instruments were used: the MTEBI, to measure 
the teachers’ efficacy beliefs and the AMS, to measure the attitude towards 
mathematics. Specifically, the study showed that the teachers’ efficacy beliefs 
and the attitude towards mathematics have a significant moderate correlation. 
Additionally, the value of p indicates that the teachers’ efficacy beliefs has the 
highest mean with respect attitude towards mathematics. The results indicate 
that we have to apply methodological strategies that help improve the attitude 
of pre-service teachers. 
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1 Introduction 


By the end of 2018, roughly 44 percent of European households were 
equipped with a smart meter (SM). According to [3], by the year 2023 the 
distribution will be around 71 percent, which means that over 200 million 
homes in Europe will be equipped with smart meters. 

With SMs the power usage can be observed in real time by the energy 
providers, thus allowing as much electricity as necessary to be produced in a 
specific time. However, this increase of efficiency comes at the cost of privacy. 


2 Non-Cryptographic Smart Meter Privacy 


The communication of smart meters with utility companies takes place 
over a public network (the Internet) and is regarded as unsafe unless appro- 
priate countermeasures are taken. That is why a lot of work in this area has 
been performed and is still ongoing on protecting privacy. Most researchers 
focus on cryptographic solutions, mainly Partially Homomorphic Encryption 
(PHE) and Fully Homomorphic Encryption (FHE). Although cryptographic 
methods are appealing, one should consider the limited computational capac- 
ity of a smart meter. With this in mind, non-cryptographic solutions seem 
worth exploring. 


2.1 Privacy models 


Non-cryptographic solutions are normally oriented to satisfying a privacy 
model, that is, a privacy condition. As stated by Domingo-Ferrer and Soria- 
Comas in [1], there are four major privacy models: Randomized Response, 
k-anonymity, Differential Privacy and Permutation Paradigm. 
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For the sake of brevity, only Randomized Response is illuminated here. Ran- 
domized Response (RR) was invented by Warner in 1965 [4] and views pri- 
vacy as deniability. RR allows obtaining answers to sensitive questions (e.g. 
did you take drugs?) while assuring the respondent’s privacy. The respondent 
is allowed to randomize her answer according to some prescribed probability 
matrix before reporting it. Thus, she can deny that the reported answer is her 
true answer, but at the same time the interviewer can use his knowledge of 
the probability matrix to estimate the true distribution of the answers of a set 
of respondents from the distribution of their reported answers. 


2.2 Aggregation and smart meters 


Aggregation is a useful principle to achieve privacy. It is also instrumental to 
reduce computation and communication costs, which is very welcome in smart 
meters because they have constrained resources. We summarize the overview 
on aggregation in smart meters given by Erkin and Tsudik in [2]: 


e Spatial aggregation. The smart meters are (geographically) clustered. 
e Temporal aggregation. Fine-grained data is summed up by the smart meter. 
e Spatio-temporal aggregation. Hybrid setting — a combination of both. 


3 Our Proposed Scheme 
3.1 Goals 


We propose a non-cryptographic solution that offers the following features: 


® privacy; 

e simplicity (lightweight, no need for a TTP); 

e support for individual readings (without aggregation); 
e low computational overhead; 

e low communication overhead; 

e integrity; 
e high accuracy. 


We demonstrate our approach with real data (from the London Data Store?) 
and using a realistic smart meter architecture (with respect to scalability). 


3.2 Rationale 


The diagram in Figure 1 compares cryptographic and non-cryptographic 
solutions. Our aim is to develop a scheme based on RR that can handle indi- 
vidual readings and maintain privacy at the same time. 


e https://data.london.gov.uk/dataset/smartmeter-energy-—use-data-in-london-households 
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Fig. 1: Cryptographic vs. non-cryptographic solutions 


4 Experimental Results 


4.1 Smart meter data set 


As stated above, we have used real data from the London Data Store®. 5567 
households were selected as a balanced sample of London’s society. Data sets 
were then released with the energy consumption measured every half hour, 
a unique household identifier and a time-stamp. For this paper the data set 
UKPN-LCL-smartmeter-sample.csv was used. 


4.2 Creating the RR matrix 


Let us denote by X the attribute containing the answer to a sensitive 
question. If X can take r possible values, then the randomized response Y 
reported by the respondent instead of X follows an r xr matrix of probabilities 


P=/|i:: (1) 
Pri°*** Prr 


where pyuy = Pr(Y =v|X =u), for u,v € 1,...,r denotes the probability that 
the randomized response is v when the respondent’s true attribute value is wu. 
We have constructed RR probability matrices with p = 0.4, p = 0.6 and 
p = 0.8 in the main diagonal. We have filled the off-diagonal probabilities by 
using three different attenuation formulae A, B and C on p. In total we have 9 
different probability matrices, corresponding to each value of p combined with 
each attenuation formula. Attenuations are as follows: 


e Attenuation formula A. Populate the matrix with p in the main diagonal. 
For the super-diagonal and the sub-diagonal divide p by 2. For the super- 
super-diagonal and the sub-sub-diagonal divide p by 4, and so on. Finally, 
rescale the probabilities in each row of the matrix for them to sum to 1. 


2 https://data.london.gov.uk/dataset/smartmeter-energy-—use-data-in-london-households 
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e Attenuation formula B. Populate the matrix with p in the main diagonal. 
For the super-diagonal and the sub-diagonal divide p by 1 plus the distance 
to the diagonal. Do the same for the super-super-diagonal and the sub-sub- 
diagonal, and so on. Finally, rescale the probabilities in each row of the 
matrix for them to sum to 1. 

Attenuation formula C. Populate the matrix with p in the main diagonal. 
For the super-diagonal and the sub-diagonal raise p to the power of 1 plus 
the distance to the diagonal. Do the same for the super-super-diagonal and 
the sub-sub-diagonal, and so on. Finally, rescale the probabilities in each 
row of the matrix for them to sum to 1. 


4.3 Analysis 


Our analysis consists in comparing the distributions. To transform contin- 
uous consumption data into categorical data amenable to randomization, we 
have split the range of consumption into 16 intervals. The diagram in figure 2 
present results corresponding to the average of 100 runs for matrix A. The 
abscissa in the diagram represents the intervals, and the ordinate the relative 
frequencies of values that fall in each interval. 

The blue line shows the frequencies of 7, the orange line the frequencies of 
mt for p = 0.4, the gray line 7 for p = 0.6 and the yellow line 7 for p © 0.8. 
The values of p are approximate due to rescaling. In general, lower values of p 
offer more privacy, at the cost of less accuracy. 


—e—pi 9 —e—pi hat forp=04 pihatforp=0.6 —e pi hat forp=08 


Fig. 2: Attenuation A. Original frequencies and 7 frequencies for p = 0.4, 0.6 
and 0.8 
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